Combining Linguistics and Statistics for High-Quality Limited Domain English-Chinese Machine Translation
نویسندگان
چکیده
Second language learning is a compelling activity in today’s global markets. This thesis focuses on critical technology necessary to produce a computer spoken translation game for learning Mandarin Chinese in a relatively broad travel domain. Three main aspects are addressed: efficient Chinese parsing, high-quality English-Chinese machine translation, and how these technologies can be integrated into a translation game system. In the language understanding component, the TINA parser is enhanced with bottom-up and long distance constraint features. The results showed that with these features, the Chinese grammar ran ten times faster and covered 15% more of the test set. In the machine translation component, a combined method of linguistic and statistical system is introduced. The English-Chinese translation is done via an intermediate language “Zhonglish”, where the English-Zhonglish translation is accomplished by a parse-and-paraphrase paradigm using hand-coded rules, mainly for structural reconstruction. Zhonglish-Chinese translation is accomplished by a standard phrasebased statistical machine translation system, mostly accomplishing word sense disambiguation and lexicon mapping. We evaluated in an independent test set in IWSLT travel domain spoken language corpus. Substantial improvements were achieved for GIZA alignment crossover: we obtained a 45% decrease in crossovers compared to a traditional phrase-based statistical MT system. Furthermore, the BLEU score improved by 2 points. Finally, a framework of the translation game system is described, and the feasibility of integrating the components to produce reference translation and to automatically assess student’s translation is verified. Thesis Supervisor: Stephanie Seneff Title: Principal Research Scientist
منابع مشابه
Combining Linguistic and Statistical Methods for Bi-directional English Chinese Translation in the Flight Domain
In this paper, we discuss techniques to combine an interlingua translation framework with phrase-based statistical methods, for translation from Chinese into English. Our goal is to achieve high-quality translation, suitable for use in language tutoring applications. We explore these ideas in the context of a flight domain, for which we have a large corpus of English queries, obtained from user...
متن کاملExploiting alignment techniques in MATREX: the DCU machine translation system for IWSLT 2008
In this paper, we give a description of the machine translation (MT) system developed at DCU that was used for our third participation in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT 2008). In this participation, we focus on various techniques for word and phrase alignment to improve system quality. Specifically, we try out our word packing and syn...
متن کاملMatrex: the DCU machine translation system for IWSLT 2007
In this paper, we give a description of the machine translation system developed at DCU that was used for our second participation in the evaluation campaign of the International Workshop on Spoken Language Translation (IWSLT 2007). In this participation, we focus on some new methods to improve system quality. Specifically, we try our word packing technique for different language pairs, we smoo...
متن کاملThe DCU machine translation systems for IWSLT 2011
In this paper, we provide a description of the Dublin City University’s (DCU) submissions in the IWSLT 2011 evaluation campaign.1 We participated in the Arabic-English and Chinese-English Machine Translation(MT) track translation tasks. We use phrase-based statistical machine translation (PBSMT) models to create the baseline system. Due to the open-domain nature of the data to be translated, we...
متن کاملA Study on the Formalization of English Subjunctive Mood
One of the main problems that affects the quality of machine translation is how to express the knowledge of language in precision. Subjunctive mood is a very common language phenomenon in English. From the perspective of E-C machine translation, and based on the theory of Semantic Element (SE) in Unified Linguistics, the paper discusses the specific formalization methods for each type of the En...
متن کامل